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DETAILED ACTION 

Continued Examination Under 37 CFR 1.114 

1 . A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1 .17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1 .17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 
9/5/2006 has been entered. 

Remarks 

2. Receipt of Applicant's Amendment, filed on 07/17/2008, is acknowledged. The 
amendment includes the amending of claims 1,14, and 18. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. This application currently names joint inventors. In considering patentability of 
the claims under 35 U.S.C. 103(a), the examiner presumes that the subject matter of 
the various claims was commonly owned at the time any inventions covered therein 
were made absent any evidence to the contrary. Applicant is advised of the obligation 
under 37 CFR 1 .56 to point out the inventor and invention dates of each claim that was 
not commonly owned at the time a later invention was made in order for the examiner to 
consider the applicability of 35 U.S.C. 103(c) and potential 35 U.S.C. 102(e), (f) or (g) 
prior art under 35 U.S.C. 103(a). 

5. Claims 1-9, and 14-20 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chakrabarti et al. (U.S. Patent 6,418,433) in view of Liang (U.S. 
PGPUB 2001/0044818), and further in view of Weiss et al. (U.S. Patent 7,085,753). 

6. Regarding claim 1 , Chakrabarti teaches a method comprising: 
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A) selectively prioritizing the documents to crawl based on a set of rules (Column 8, 
lines 2-30); 

B) fetching prioritized documents from the network (Column 5, lines 40-46); 

C) for each fetched document, determining whether the fetched document is relevant to 
any of the focus topics (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 
61-65, Column 10, lines 18-43); 

D) crawling the fetched document that matches any of the focus topics such that the 
fetched document is crawled only once even if the fetched document matches a plurality 
of the focus documents (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 
61-65, Column 9, lines 45-67-Column 10, lines 1-3, Column 10, lines 35-43); 

E) wherein the fetched document comprises a document of interest for access by a 
user (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 61-65, Column 10, 
lines 18-43); 

F) further crawling out-links on the fetched document based on an assumption that if 
the fetched document is of interest, the out-links are also of interest (Column 2, lines 56- 
60, Column 3, lines 51-55, Column 4, lines 61-65, Column 10, lines 35-43); 

G) determined whether the fetched document should be disallowed (Column 2, lines 
56-60, Column 3, lines 51-55, Column 4, lines 61-65, Column 10, lines 18-34); and 

H) upon determination that the fetched document should be disallowed, selectively 
disallowing the fetched document (Column 10, lines 18-34); 

K) wherein the crawling is performed using a collaborative focus by analyzing the 
documents for more than one focus topic at a time (Column 4, lines 61-67-Column 5, 
lines 1-13). 

The examiner notes that Chakrabarti teaches "selectively prioritizing the 
documents to crawl based on a set of rules" as "The priority and relevance fields 
permit two types of crawl policies, i.e., the above-mentioned "soft" and "hard" crawl 
policies. For the "hard" crawl policy, the classifier 28 is invoked as described above on 
a Web page, and when it returns the best matching category path, the out-links of the 
page are entered into the crawl database 30 if and only if some node on the best 
matching category is marked as "good". FIG. 5 shows the details of such a "hard" crawl 
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policy. As recognized herein, however, such a policy can lead to crawl stagnation, 
preferred solutions to which are addressed in FIGS. 5 and 6. Alternatively, a "soft" 
policy can be implemented in which all out-links are entered into the crawl database 30, 
but their crawl priority is based on the relevance of the current page. A batch of 
unvisited pages (typically, a few dozen per thread) are selected in lexicographic order of 
(Num_Tries, relevance desc, priority asc, bytehash), where "asc" means ascending 
"desc" means descending, and bytehash is a random number to resolve ties without 
loading any particular server. Each URL from the group is downloaded and classified, 
which generally leads to a revision of the relevance score. The revised relevance score 
is also written into the new records created for unvisited out-links" (Column 8, lines 8- 
30). The examiner further notes that Chakrabarti teaches "fetching prioritized 
documents from the network" as "the Web page table 32 includes a priority field 42 
that represents how often the Web page is to be revisited by the crawler 14" (Column 5, 
lines 41-42). The examiner further notes that Chakrabarti teaches "for each fetched 
document, determining whether the fetched document is relevant to any of the 
focus topics" as "The topic analyzer 28 compares the content of a Web page with a 
predefined topic or topics and generates a response representative of how relevant the 
Web page is" (Column 4, lines 61-65), "When the process determines that the page 
under test is not relevant to the predefined topic" (Column 10, lines 18-19), and "If the 
page under test is determined to be relevant to the topic" (Column 10, lines 35-36). The 
examiner further notes that Chakrabarti teaches "crawling the fetched document 
that matches any of the focus topics such that the fetched document is crawled 
only once even if the fetched document matches a plurality of the focus 
documents" as "If the page under test is determined to be relevant to the topic, 
however, the process moves to block 110, wherein entries are generated for the link 
table 34 for all outlinks of the page" (Column 10, lines 35-39) and "Moving to decision 
diamond 90 the worker thread determines whether the assigned page is a new page or 
an old page. If the page is an old page the logic moves to block 92 to retrieve only the 
modified portions, if any, of the page, i.e., the portions that the associated Web server 
indicates have changed since the last time the page was considered by the system 10. 
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Accordingly, at decision diamond 94 it is determined by the system 10 whether in fact 
the old page has been changed as reported by the associated Web server, and if the 
page has not been changed, the process loops back to the sleep state at block 86. In 
contrast, if the page is an old page that has been determined to have changed at 
decision diamond 94, or if the page is determined to be a new page at decision diamond 
90, the logic moves to block 96 to retrieve the entire page from the associated Web 
server. At block 98, a checksum representative of the page's content is computed, and 
this checksum establishes the OID field 38 (FIG. 1) of the associated entry in the Web 
page table 32. Moving to decision diamond 100, when the page under test is an old 
page the checksum computed at block 98 is compared against the previous value in the 
associated OID field 38 to again determine, at a relatively fine level of granularity, 
whether any changes have occurred. If the checksum comparison indicates that no 
changes have occurred, the process loops back to sleep at block 86" (Column 9, lines 
45-67-Column 10, lines 1-3). The examiner further notes that Chakrabarti teaches 
"wherein the fetched document comprises a document of interest for access by a 
user" as "The topic analyzer 28 compares the content of a Web page with a predefined 
topic or topics and generates a response representative of how relevant the Web page 
is" (Column 4, lines 61-65), "When the process determines that the page under test is 
not relevant to the predefined topic" (Column 10, lines 18-19), and "If the page under 
test is determined to be relevant to the topic" (Column 10, lines 35-36). The examiner 
further notes that Chakrabarti teaches "further crawling out-links on the fetched 
document based on an assumption that if the fetched document is of interest, the 
out-links are also of interest" as "If the page under test is determined to be relevant to 
the topic, however, the process moves to block 110, wherein entries are generated for 
the link table 34 for all outlinks of the page" (Column 1 0, lines 35-39). The examiner 
further notes that Chakrabarti teaches "determined whether the fetched document 
should be disallowed" as "The topic analyzer 28 compares the content of a Web page 
with a predefined topic or topics and generates a response representative of how 
relevant the Web page is" (Column 4, lines 61-65) and "When the process determines 
that the page under test is not relevant to the predefined topic, the process moves to 
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block 1 08 to update the Web page table 32 entries for the page under test (if the page is 
an old page), and then to return to block 86. It is to be understood that only the page 
under test is recorded at block 1 08, and that the outlinks of the page under test are not 
entered into the link table 34. Also, if the page under test is a new but irrelevant page, it 
is not added to the page table 32 at block 108. Thus, from one aspect, the page under 
test is pruned at block 108, in that its outlinks are not stored by the system 10 and the 
page itself is not stored if the page is a new but irrelevant page" (Column 10, lines 18- 
29). The examiner further notes that Chakrabarti teaches "upon determination that 
the fetched document should be disallowed, selectively disallowing the fetched 
document" as "When the process determines that the page under test is not relevant to 
the predefined topic, the process moves to block 108 to update the Web page table 32 
entries for the page under test (if the page is an old page), and then to return to block 
86. It is to be understood that only the page under test is recorded at block 108, and 
that the outlinks of the page under test are not entered into the link table 34. Also, if the 
page under test is a new but irrelevant page, it is not added to the page table 32 at 
block 108. Thus, from one aspect, the page under test is pruned at block 108, in that its 
outlinks are not stored by the system 10 and the page itself is not stored if the page is a 
new but irrelevant page" (Column 10, lines 18-29). The examiner further notes that 
Chakrabarti teaches "wherein the crawling is performed using a collaborative 
focus by analyzing the documents for more than one focus topic at a time" as 
"Additionally, the focussed crawler 14 accesses a topic analyzer 28 (also referred to 
herein as "hypertext classifier"). The topic analyzer 28 compares the content of a Web 
page with a predefined topic or topics and generates a response representative of how 
relevant the Web page is to the topic. A relevant Web page is referred to as a "good" 
Web page. Details of one preferred topic analyzer are set forth in co-pending U.S. 
patent application Ser. No. 09/143,733, filed Aug. 29, 1998, for an invention entitled 
"Method for Interactively Creating an Information Database Including Preferred 
Information Elements, such as Preferred-Authority, Worldwide Web Page", owned by 
the same assignee as is the present invention and incorporated herein by reference. Or, 
the following references provide topic analyzers: Chakrabarti et al., "Enhanced 
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Hypertext Categorization Using Hyperlinks", SIGMOD ACM, 1998, and Chakrabarti et 
al., "Scalable Feature Selection, Classification, and Signature Generation for Organizing 
Large Text Databases into Hierarchical Taxonomies", VLDB Journal, invited paper, 
August, 1998" (Column 4, lines 61-67-Column 5, lines 1-13). 

Chakrabarti does not explicitly teach: 
I) identifying a resource locator string associated with the disallowed fetched document; 
and 

J) placing the resource locator string for the disallowed fetched document in a blacklist 
in order to prevent future crawling of the fetched document. 

Liang, however, teaches "identifying a resource locator string associated 
with the disallowed fetched document" as "As shown in FIG. 9, in step 902, web 
spider 26 is provided with a first URL of a web site known to contain pornographic 
material. In a preferred embodiment, the web site is one that comprises a plurality of 
links to both additional pages at the pornographic website, as well as other 
pornographic websites" (Paragraph 63), and "wherein the miner comprises an 
unfocus miner that places the resulting uniform resource locator strings that 
match an unfocus topic in a blacklist, so that the uniform resource locator strings 
will not be crawled again" as "web spider 26 determines whether the retrieved web 
content contains pornographic material. If it does, then in step 908, web spider 26 adds 
the URL to list 28" (Paragraph 64). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Liang's would have allowed Chakrabarti's to provide a method to allow for web 
crawlers and spiders to dynamically restrict unwanted and unacceptable material, as 
noted by Liang (Paragraph 3). 

Chakrabarti and Liang do not explicitly teach: 
L) analyzing a plurality of fetched documents obtained from the crawling by providing a 
graphical indicia of one or more properties that are indicative of which of the plurality of 
documents represent a best source of information for a specific topic within the plurality 
of fetched documents . 
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Weiss, however, teaches " analyzing a plurality of fetched documents 
obtained from the crawling by providing a graphical indicia of one or more 
properties that are indicative of which of the plurality of documents represent a 
best source of information for a specific topic within the plurality of fetched 
documents " as "FIG. 5 schematically illustrates an example of clusters organized in a 
tree structure, according to an embodiment of the invention. The "Sport" cluster (or 
"group") contains several sub-clusters (or sub-groups)-Football, Basketball and Boxing 
sub-clusters, etc. The cluster "Charlie's Angels" appears as a sub-cluster of the TV 
Series cluster, as a sub-cluster of the movies cluster and as a sub-cluster of the boxing 
cluster (there is a boxing team that is called "Charlie's Angels"). The circles denote Web 
sites. A Web site can belong to several clusters. The data structure created by the 
clustering process can also be seen as a map of the web, since every site in the web 
has a specific location in the tree" (Column 7, lines 58-67-Column 8, lines 1-3) and "The 
last level of the focusing process is the presentation of a street, as described above. 
FIG. 10 schematically illustrates an example of a "street" presentation of a group of 
Web sites found in a Web search, according to an embodiment of the invention. The 
buildings, each represents a Web site, are numbered from 11 to 16. Building 14 
represents a Web site, which is owned by an enterprise, hence, its presentation is like 
an office building. Building 13 represents an amateur Web site and hence, it is 
presented like a private house. Building 16 represents a Web site that is owned by an 
academic institute, and therefore is presented like a campus. Building 11 represents a 
Web site that sells products, for example, it has an e-store, and thus it comprises a 
display-window. As mentioned above, the height of each building is relative to the 
number of hyperlinks pointing to and from the Web site represented by it. The width of 
the Web site represents, for example, the amount of information in the Web site. This 
parameter can be determined by the amount of words, pages, bytes, and so forth. It 
should be noted that the parameters of each Web site, as well as the continents, which 
are formed according to clusters, are attained and prepared for display by the search 
engine facility prior to the search by the user, by a process independent of the user 
search, which is carried out in real time. The application described above is 
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geographically oriented. However, other reference "worlds" may be implemented in 
order to emphasize the attributes of a Web site" (Column 10, lines 24-50. 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Weiss's would have allowed Chakrabarti's and Liang's to provide a method to allow 
for web crawlers and spiders to obtain internet content which provides presentation of 
the Web sites, such that the visualization reveals certain attributes of the presented 
Web sites, as noted by Weiss (Column 3, lines 11-13). 

Regarding claim 2, Chakrabarti teaches a method comprising: 
A) seeding a plurality of seed uniform resource locator strings to start the collaborative 
focused crawling of the documents (Column 5, lines 61-67-Column 6, lines 1-15). 

The examiner notes that Chakrabarti teaches "seeding a plurality of seed 
uniform resource locator strings to start the collaborative focused crawling of the 
documents" as "It is to be understood that information pertaining to a "seed" set of 
Web pages is initially stored in the Web page table 32. The seed set can be gathered 
from, e.g., the temporary Internet file directories of the employees of a company or from 
some other group that can be expected to have shared interests... Thus, the seed set 
does not define a comprehensive, universal set of all topics on the Web, but rather a 
relatively narrow topic or range of topics that are of interest to the particular source" 
(Column 5, lines 61-67-Column 6, lines 1-4). 

Regarding claim 3, Chakrabarti teaches a method comprising: 
A) crawling the seed uniform resource locator strings (Column 6, lines 61-67-Column 7, 
lines 1-2, Column 10, lines 44-64). 

The examiner notes that Chakrabarti teaches "crawling the seed uniform 
resource locator strings" as "starting with the seed set the URL of each page is 
selected" (Column 6, lines 61-62) and "the current page is classified to its topics, using 
the topic analyzer 28 (FIG. 1 ), and then the page is evaluated for relevancy to the 
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predefined topic at the decision diamond 11 6. ..when the page is a "good" page the logic 
expands the outlinks of the page" (Column 10, lines 45-51). 

Regarding claim 4, Chakrabarti teaches a method comprising: 
A) writing a plurality of resulting uniform resource locator strings obtained by crawling 
the seed uniform resource locator strings (Column 10, lines 35-43, 51-64). 

The examiner notes that Chakrabarti teaches "writing a plurality of resulting 
uniform resource locator strings obtained by crawling the seed uniform resource 
locator strings" as "If the page under test is determined to be relevant to the topic, 
however, the process moves to block 110, wherein entries are generated for the link 
table 34 for all outlinks of the page" (Column 10, lines 35-38). 

Regarding claim 5, Chakrabarti teaches a method comprising: 
A) a foreman function for reading a plurality of contents of the resulting uniform 
resource locator strings (Column 10, lines 4-10, 51-64) 

The examiner notes that Chakrabarti teaches "a foreman function for reading 
a plurality of contents of the resulting uniform resource locator strings" as "If the 
checksum comparison at decision diamond 100 indicates that new data is begin 
considered, however, the logic proceeds to block 102 to tokenize the Web page" 
(Column 10, lines 4-6). 

Regarding claim 6, Chakrabarti teaches a method comprising: 
A) the foreman function passing the contents of the resulting uniform resource locator 
strings to a miner (Column 10, lines 10-17, 51-64). 

The examiner notes that Chakrabarti teaches "a foreman function for reading 
a plurality of contents of the resulting uniform resource locator strings" as "Then , 
the page is classified at block 104 using the topic analyzer or classifier 28" (Column 10, 
lines 10-11). 

Regarding claim 7, Chakrabarti teaches a method comprising: 
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A) the miner instructing a fetcher to crawl a plurality of out-links on a document of the 
resulting resource locator string when the contents of the resulting resource locator 
string match a focus topic of the miner (Column 10, lines 35-43, 51-64). 

The examiner notes that Chakrabarti teaches "the miner instructing a fetcher 
to crawl a plurality of out-links on a document of the resulting resource locator 
string when the contents of the resulting resource locator string match a focus 
topic of the miner" as "If the page under test is determined to be relevant to the topic, 
however, the process moves to block 110, wherein entries are generated for the link 
table 34 for all outlinks of the page" (Column 10, lines 35-38). 

Regarding claim 8, Chakrabarti teaches a method comprising: 
A) the miner ignoring resulting resource locator string when the contents of the 
resulting resource locator string do not match the focus of the miner (Column 10, lines 
18-34). 

The examiner notes that Chakrabarti teaches "the miner instructing a fetcher 
to crawl a plurality of out-links on a document of the resulting resource locator 
string when the contents of the resulting resource locator string match a focus 
topic of the miner" as "When the process determines that the page under test is not 
relevant to the predefined topic, the process moves to block 108 to update the Web 
page table 32... the outlinks of the page under test are not entered into the link table" 
(Column 10, lines 18-24). 

Regarding claim 9, Chakrabarti teaches a method comprising: 
A) the miner managing a plurality of focus topics (Column 2, lines 56-60, Column 3, 
lines 51-55, Column 4, lines 61-65). 

The examiner notes that Chakrabarti teaches "the miner managing a plurality 
of focus topics" as "The topic analyzer 28 compares the content of a Web page with a 
predefined topic or topics and generates a response representative of how relevant the 
Web page is" (Column 4, lines 61-65). 
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Regarding claim 14, Chakrabarti teaches a computer program product 
comprising: 

A) a first set of instruction codes for selectively prioritizing the documents to crawl 
based on a set of rules (Column 8, lines 2-30); 

B) a second set of instruction codes for fetching prioritized documents from the network 
(Column 5, lines 40-46); 

C) for each fetched document, a third set of instruction codes determines whether the 
fetched document is relevant to any of the focus topics (Column 2, lines 56-60, Column 

3, lines 51-55, Column 4, lines 61-65, Column 10, lines 18-43); 

D) a fourth set of instruction codes for crawling the fetched document that matches any 
of the focus topics such that the fetched document is crawled only once even if the 
fetched document matches a plurality of the focus topics (Column 2, lines 56-60, 
Column 3, lines 51-55, Column 4, lines 61-65, Column 9, lines 45-67-Column 10, lines 
1-3, Column 10, lines 35-43); 

E) wherein the fetched document comprises a document of interest for access by a 
user (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 61-65, Column 10, 
lines 18-43); 

F) wherein the fourth set of instruction codes further crawls out-links on the fetched 
document based on an assumption that if the fetched document is of interest, the out- 
links are also of interest (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 
61-65, Column 10, lines 35-43); 

G) wherein the fourth set of instruction codes further determine whether the fetched 
document should be disallowed (Column 2, lines 56-60, Column 3, lines 51-55, Column 

4, lines 61-65, Column 10, lines 18-34); and 

H) upon determination that the fetched document should be disallowed, selectively 
disallowing the fetched document (Column 10, lines 18-34); 

K) wherein the crawling is performed using a collaborative focus by analyzing the 
documents for more than one focus topic at a time (Column 4, lines 61-67-Column 5, 
lines 1-13). 
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The examiner notes that Chakrabarti teaches "a first set of instruction codes 
for selectively prioritizing the documents to crawl based on a set of rules" as "The 
priority and relevance fields permit two types of crawl policies, i.e., the above-mentioned 
"soft" and "hard" crawl policies (Column 8, lines 8-1 1 ). The examiner further notes that 
Chakrabarti teaches "a second set of instruction codes for fetching prioritized 
documents from the network" as "the Web page table 32 includes a priority field 42 
that represents how often the Web page is to be revisited by the crawler 14" (Column 5, 
lines 41-42). The examiner further notes that Chakrabarti teaches "for each fetched 
document, a third set of instruction codes determines whether the fetched 
document is relevant to any of the focus topics" as "The topic analyzer 28 
compares the content of a Web page with a predefined topic or topics and generates a 
response representative of how relevant the Web page is" (Column 4, lines 61-65), 
"When the process determines that the page under test is not relevant to the predefined 
topic" (Column 10, lines 18-19), and "If the page under test is determined to be relevant 
to the topic" (Column 10, lines 35-36). The examiner further notes that Chakrabarti 
teaches "a fourth set of instruction codes for crawling the fetched document that 
matches any of the focus topics such that the fetched document is crawled only 
once even if the fetched document matches a plurality of the focus topics" as "If 
the page under test is determined to be relevant to the topic, however, the process 
moves to block 110, wherein entries are generated for the link table 34 for all outlinks of 
the page" (Column 10, lines 35-39) and "Moving to decision diamond 90 the worker 
thread determines whether the assigned page is a new page or an old page. If the page 
is an old page the logic moves to block 92 to retrieve only the modified portions, if any, 
of the page, i.e., the portions that the associated Web server indicates have changed 
since the last time the page was considered by the system 10. Accordingly, at decision 
diamond 94 it is determined by the system 10 whether in fact the old page has been 
changed as reported by the associated Web server, and if the page has not been 
changed, the process loops back to the sleep state at block 86. In contrast, if the page 
is an old page that has been determined to have changed at decision diamond 94, or if 
the page is determined to be a new page at decision diamond 90, the logic moves to 
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block 96 to retrieve the entire page from the associated Web server. At block 98, a 
checksum representative of the page's content is computed, and this checksum 
establishes the OID field 38 (FIG. 1 ) of the associated entry in the Web page table 32. 
Moving to decision diamond 100, when the page under test is an old page the 
checksum computed at block 98 is compared against the previous value in the 
associated OID field 38 to again determine, at a relatively fine level of granularity, 
whether any changes have occurred. If the checksum comparison indicates that no 
changes have occurred, the process loops back to sleep at block 86" (Column 9, lines 
45-67-Column 10, lines 1-3). The examiner further notes that Chakrabarti teaches 
"wherein the fetched document comprises a document of interest for access by a 
user" as "The topic analyzer 28 compares the content of a Web page with a predefined 
topic or topics and generates a response representative of how relevant the Web page 
is" (Column 4, lines 61-65), "When the process determines that the page under test is 
not relevant to the predefined topic" (Column 10, lines 18-19), and "If the page under 
test is determined to be relevant to the topic" (Column 10, lines 35-36). The examiner 
further notes that Chakrabarti teaches "wherein the fourth set of instruction codes 
further crawls out-links on the fetched document based on an assumption that if 
the fetched document is of interest, the out-links are also of interest" as "If the 
page under test is determined to be relevant to the topic, however, the process moves 
to block 1 1 0, wherein entries are generated for the link table 34 for all outlinks of the 
page" (Column 10, lines 35-39). The examiner further notes that Chakrabarti teaches 
"wherein the fourth set of instruction codes further determines whether the 
fetched document should be disallowed" as "The topic analyzer 28 compares the 
content of a Web page with a predefined topic or topics and generates a response 
representative of how relevant the Web page is" (Column 4, lines 61-65) and "When the 
process determines that the page under test is not relevant to the predefined topic, the 
process moves to block 108 to update the Web page table 32 entries for the page under 
test (if the page is an old page), and then to return to block 86. It is to be understood 
that only the page under test is recorded at block 108, and that the outlinks of the page 
under test are not entered into the link table 34. Also, if the page under test is a new but 
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irrelevant page, it is not added to the page table 32 at block 108. Thus, from one aspect, 
the page under test is pruned at block 108, in that its outlinks are not stored by the 
system 1 0 and the page itself is not stored if the page is a new but irrelevant page" 
(Column 10, lines 18-29). The examiner further notes that Chakrabarti teaches "upon 
determination that the fetched document should be disallowed, selectively 
disallowing the fetched document" as "When the process determines that the page 
under test is not relevant to the predefined topic, the process moves to block 108 to 
update the Web page table 32 entries for the page under test (if the page is an old 
page), and then to return to block 86. It is to be understood that only the page under test 
is recorded at block 108, and that the outlinks of the page under test are not entered 
into the link table 34. Also, if the page under test is a new but irrelevant page, it is not 
added to the page table 32 at block 108. Thus, from one aspect, the page under test is 
pruned at block 108, in that its outlinks are not stored by the system 10 and the page 
itself is not stored if the page is a new but irrelevant page" (Column 10, lines 18-29). 
The examiner further notes that Chakrabarti teaches "wherein the crawling is 
performed using a collaborative focus by analyzing the documents for more than 
one focus topic at a time" as "Additionally, the focussed crawler 14 accesses a topic 
analyzer 28 (also referred to herein as "hypertext classifier"). The topic analyzer 28 
compares the content of a Web page with a predefined topic or topics and generates a 
response representative of how relevant the Web page is to the topic. A relevant Web 
page is referred to as a "good" Web page. Details of one preferred topic analyzer are 
set forth in co-pending U.S. patent application Ser. No. 09/143,733, filed Aug. 29, 1998, 
for an invention entitled "Method for Interactively Creating an Information Database 
Including Preferred Information Elements, such as Preferred-Authority, Worldwide Web 
Page", owned by the same assignee as is the present invention and incorporated herein 
by reference. Or, the following references provide topic analyzers: Chakrabarti et al., 
"Enhanced Hypertext Categorization Using Hyperlinks", SIGMOD ACM, 1998, and 
Chakrabarti et al., "Scalable Feature Selection, Classification, and Signature Generation 
for Organizing Large Text Databases into Hierarchical Taxonomies", VLDB Journal, 
invited paper, August, 1998" (Column 4, lines 61-67-Column 5, lines 1-13). 
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Chakrabarti does not explicitly teach: 
I) identifying a resource locator string associated with the disallowed fetched document; 
and 

J) placing the resource locator string for the fetched document in a blacklist in order to 
prevent future crawling of the fetched document. 

Liang, however, teaches "identifying a resource locator string associated 
with the disallowed fetched document" as "As shown in FIG. 9, in step 902, web 
spider 26 is provided with a first URL of a web site known to contain pornographic 
material. In a preferred embodiment, the web site is one that comprises a plurality of 
links to both additional pages at the pornographic website, as well as other 
pornographic websites" (Paragraph 63), and "wherein the miner comprises an 
unfocus miner that places the resulting uniform resource locator strings that 
match an unfocus topic in a blacklist, so that the uniform resource locator strings 
will not be crawled again" as "web spider 26 determines whether the retrieved web 
content contains pornographic material. If it does, then in step 908, web spider 26 adds 
the URL to list 28" (Paragraph 63). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Liang's would have allowed Chakrabarti's to provide a method to allow for web 
crawlers and spiders to dynamically restrict unwanted and unacceptable material, as 
noted by Liang (Paragraph 3). 

Chakrabarti and Liang do not explicitly teach: 
L) analyzing a plurality of fetched documents obtained from the crawling by providing a 
graphical indicia of one or more properties that are indicative of which of the plurality of 
documents represent a best source of information for a specific topic within the plurality 
of fetched documents . 

Weiss, however, teaches " analyzing a plurality of fetched documents 
obtained from the crawling bv providing a graphical indicia of one or more 
properties that are indicative of which of the plurality of documents represent a 
best source of information for a specific topic within the plurality of fetched 
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documents " as "FIG. 5 schematically illustrates an example of clusters organized in a 
tree structure, according to an embodiment of the invention. The "Sport" cluster (or 
"group") contains several sub-clusters (or sub-groups)-Football, Basketball and Boxing 
sub-clusters, etc. The cluster "Charlie's Angels" appears as a sub-cluster of the TV 
Series cluster, as a sub-cluster of the movies cluster and as a sub-cluster of the boxing 
cluster (there is a boxing team that is called "Charlie's Angels"). The circles denote Web 
sites. A Web site can belong to several clusters. The data structure created by the 
clustering process can also be seen as a map of the web, since every site in the web 
has a specific location in the tree" (Column 7, lines 58-67-Column 8, lines 1-3) and "The 
last level of the focusing process is the presentation of a street, as described above. 
FIG. 10 schematically illustrates an example of a "street" presentation of a group of 
Web sites found in a Web search, according to an embodiment of the invention. The 
buildings, each represents a Web site, are numbered from 11 to 16. Building 14 
represents a Web site, which is owned by an enterprise, hence, its presentation is like 
an office building. Building 13 represents an amateur Web site and hence, it is 
presented like a private house. Building 16 represents a Web site that is owned by an 
academic institute, and therefore is presented like a campus. Building 11 represents a 
Web site that sells products, for example, it has an e-store, and thus it comprises a 
display-window. As mentioned above, the height of each building is relative to the 
number of hyperlinks pointing to and from the Web site represented by it. The width of 
the Web site represents, for example, the amount of information in the Web site. This 
parameter can be determined by the amount of words, pages, bytes, and so forth. It 
should be noted that the parameters of each Web site, as well as the continents, which 
are formed according to clusters, are attained and prepared for display by the search 
engine facility prior to the search by the user, by a process independent of the user 
search, which is carried out in real time. The application described above is 
geographically oriented. However, other reference "worlds" may be implemented in 
order to emphasize the attributes of a Web site" (Column 10, lines 24-50. 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
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Weiss's would have allowed Chakrabarti's and Liang's to provide a method to allow 
for web crawlers and spiders to obtain internet content which provides presentation of 
the Web sites, such that the visualization reveals certain attributes of the presented 
Web sites, as noted by Weiss (Column 3, lines 11-13). 

Regarding claim 15, Chakrabarti teaches a computer program product 
comprising: 

A) a fifth set of instruction codes for seeding a plurality of seed uniform resource locator 
strings to start the collaborative focused crawling of the documents (Column 5, lines 61- 
67-Column 6, lines 1-15). 

The examiner notes that Chakrabarti teaches "a fifth set of instruction codes 
for seeding a plurality of seed uniform resource locator strings to start the 
collaborative focused crawling of the documents" as "It is to be understood that 
information pertaining to a "seed" set of Web pages is initially stored in the Web page 
table 32. The seed set can be gathered from, e.g., the temporary Internet file 
directories of the employees of a company or from some other group that can be 
expected to have shared interests... Thus, the seed set does not define a 
comprehensive, universal set of all topics on the Web, but rather a relatively narrow 
topic or range of topics that are of interest to the particular source" (Column 5, lines 61- 
67-Column 6, lines 1-4). 

Regarding claim 16, Chakrabarti teaches a computer program product 
comprising: 

A) wherein the fourth set of instruction codes further crawls the seed uniform resource 
locator strings (Column 6, lines 61-67-Column 7, lines 1-2, Column 10, lines 44-64). 

The examiner notes that Chakrabarti teaches "wherein the fourth set of 
instruction codes further crawls the seed uniform resource locator strings" as 

"starting with the seed set the URL of each page is selected" (Column 6, lines 61-62) 
and "the current page is classified to its topics, using the topic analyzer 28 (FIG. 1), and 
then the page is evaluated for relevancy to the predefined topic at the decision diamond 



Application/Control Number: 10/686,964 
Art Unit: 2168 



Page 19 



1 16. ..when the page is a "good" page the logic expands the outlinks of the page" 
(Column 10, lines 45-51). 

Regarding claim 17, Chakrabarti teaches a computer program product 
comprising: 

A) a sixth set of instruction codes for writing a plurality of resulting uniform resource 
locator strings obtained by crawling the seed uniform resource locator strings (Column 
10, lines 35-43, 51-64). 

The examiner notes that Chakrabarti teaches "a sixth set of instruction codes 
for writing a plurality of resulting uniform resource locator strings obtained by 
crawling the seed uniform resource locator strings" as "If the page under test is 
determined to be relevant to the topic, however, the process moves to block 110, 
wherein entries are generated for the link table 34 for all outlinks of the page" (Column 
10, lines 35-38). 

Regarding claim 18, Chakrabarti teaches a system comprising: 

A) an evaluator that selectively prioritizes the documents to crawl based on a set of 
rules (Column 8, lines 2-30); 

B) a fetcher that fetches prioritized documents from the network (Column 5, lines 40- 
46); 

C) for each fetched document, a focus engine determines whether the fetched 
document is relevant to any of the focus topics (Column 2, lines 56-60, Column 3, lines 
51-55, Column 4, lines 61-65, Column 10, lines 18-43); 

D) a crawler for crawling the fetched document that matches any of the multiple focus 
topics such that the fetched document is crawled only once even if the fetched 
document matches a plurality of the focus topics (Column 2, lines 56-60, Column 3, 
lines 51-55, Column 4, lines 61-65, Column 9, lines 45-67-Column 10, lines 1-3, Column 
10, lines 35-43); 
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E) wherein the fetched document comprises a document of interest for access by a 
user (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 61-65, Column 10, 
lines 18-43); 

F) wherein the crawler further crawls out-links on the fetched document based on an 
assumption that if the fetched document is of interest, the out-links are also of interest 
(Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 61-65, Column 10, lines 
35-43); 

G) wherein the crawler further determines whether the fetched document should be 
disallowed (Column 2, lines 56-60, Column 3, lines 51-55, Column 4, lines 61-65, 
Column 10, lines 18-34); and 

H) upon determination that the fetched document should be disallowed, selectively 
disallowing the fetched document (Column 1 0, lines 1 8-34); 

K) wherein the crawling is performed using a collaborative focus by analyzing the 
documents for more than one focus topic at a time (Column 4, lines 61-67-Column 5, 
lines 1-13). 

The examiner notes that Chakrabarti teaches "an evaluator that selectively 
prioritizes the documents to crawl based on a set of rules" as "the Web page table 
32 includes a priority field 42 that represents how often the Web page is to be revisited 
by the crawler 14" (Column 5, lines 41-42). The examiner further notes that 
Chakrabarti teaches "for each fetched document, a focus engine determines 
whether the fetched document is relevant to any of the focus topics" as "The topic 
analyzer 28 compares the content of a Web page with a predefined topic or topics and 
generates a response representative of how relevant the Web page is" (Column 4, lines 
61-65), "When the process determines that the page under test is not relevant to the 
predefined topic" (Column 10, lines 18-19), and "If the page under test is determined to 
be relevant to the topic" (Column 10, lines 35-36). The examiner further notes that 
Chakrabarti teaches "a crawler for crawling the fetched document that matches 
any of the multiple focus topics such that the fetched document is crawled only 
once even if the fetched document matches a plurality of the focus topics" as "If 
the page under test is determined to be relevant to the topic, however, the process 
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moves to block 110, wherein entries are generated for the link table 34 for all outlinks of 
the page" (Column 10, lines 35-39) and "Moving to decision diamond 90 the worker 
thread determines whether the assigned page is a new page or an old page. If the page 
is an old page the logic moves to block 92 to retrieve only the modified portions, if any, 
of the page, i.e., the portions that the associated Web server indicates have changed 
since the last time the page was considered by the system 10. Accordingly, at decision 
diamond 94 it is determined by the system 10 whether in fact the old page has been 
changed as reported by the associated Web server, and if the page has not been 
changed, the process loops back to the sleep state at block 86. In contrast, if the page 
is an old page that has been determined to have changed at decision diamond 94, or if 
the page is determined to be a new page at decision diamond 90, the logic moves to 
block 96 to retrieve the entire page from the associated Web server. At block 98, a 
checksum representative of the page's content is computed, and this checksum 
establishes the OID field 38 (FIG. 1 ) of the associated entry in the Web page table 32. 
Moving to decision diamond 100, when the page under test is an old page the 
checksum computed at block 98 is compared against the previous value in the 
associated OID field 38 to again determine, at a relatively fine level of granularity, 
whether any changes have occurred. If the checksum comparison indicates that no 
changes have occurred, the process loops back to sleep at block 86" (Column 9, lines 
45-67-Column 10, lines 1-3). The examiner further notes that Chakrabarti teaches 
"wherein the fetched document comprises a document of interest for access by a 
user" as "The topic analyzer 28 compares the content of a Web page with a predefined 
topic or topics and generates a response representative of how relevant the Web page 
is" (Column 4, lines 61-65), "When the process determines that the page under test is 
not relevant to the predefined topic" (Column 10, lines 18-19), and "If the page under 
test is determined to be relevant to the topic" (Column 10, lines 35-36). The examiner 
further notes that Chakrabarti teaches "wherein the crawler further crawls out-links 
on the fetched document based on an assumption that if the fetched document is 
of interest, the out-links are also of interest" as "If the page under test is determined 
to be relevant to the topic, however, the process moves to block 1 1 0, wherein entries 
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are generated for the link table 34 for all outlinks of the page" (Column 1 0, lines 35-39). 
The examiner further notes that Chakrabarti teaches "wherein the crawler further 
determines whether the fetched document should be disallowed" as "The topic analyzer 
28 compares the content of a Web page with a predefined topic or topics and generates 
a response representative of how relevant the Web page is" (Column 4, lines 61-65) 
and "When the process determines that the page under test is not relevant to the 
predefined topic, the process moves to block 108 to update the Web page table 32 
entries for the page under test (if the page is an old page), and then to return to block 
86. It is to be understood that only the page under test is recorded at block 108, and 
that the outlinks of the page under test are not entered into the link table 34. Also, if the 
page under test is a new but irrelevant page, it is not added to the page table 32 at 
block 108. Thus, from one aspect, the page under test is pruned at block 108, in that its 
outlinks are not stored by the system 10 and the page itself is not stored if the page is a 
new but irrelevant page" (Column 10, lines 18-29). The examiner further notes that 
Chakrabarti teaches "upon determination that the fetched document should be 
disallowed, selectively disallowing the fetched document" as "When the process 
determines that the page under test is not relevant to the predefined topic, the process 
moves to block 108 to update the Web page table 32 entries for the page under test (if 
the page is an old page), and then to return to block 86. It is to be understood that only 
the page under test is recorded at block 1 08, and that the outlinks of the page under 
test are not entered into the link table 34. Also, if the page under test is a new but 
irrelevant page, it is not added to the page table 32 at block 108. Thus, from one aspect, 
the page under test is pruned at block 108, in that its outlinks are not stored by the 
system 1 0 and the page itself is not stored if the page is a new but irrelevant page" 
(Column 10, lines 18-29). The examiner further notes that Chakrabarti teaches 
"wherein the crawling is performed using a collaborative focus by analyzing the 
documents for more than one focus topic at a time" as "Additionally, the focussed 
crawler 14 accesses a topic analyzer 28 (also referred to herein as "hypertext 
classifier"). The topic analyzer 28 compares the content of a Web page with a 
predefined topic or topics and generates a response representative of how relevant the 
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Web page is to the topic. A relevant Web page is referred to as a "good" Web page. 
Details of one preferred topic analyzer are set forth in co-pending U.S. patent 
application Ser. No. 09/143,733, filed Aug. 29, 1998, for an invention entitled "Method 
for Interactively Creating an Information Database Including Preferred Information 
Elements, such as Preferred-Authority, Worldwide Web Page", owned by the same 
assignee as is the present invention and incorporated herein by reference. Or, the 
following references provide topic analyzers: Chakrabarti et al., "Enhanced Hypertext 
Categorization Using Hyperlinks", SIGMOD ACM, 1998, and Chakrabarti et al., 
"Scalable Feature Selection, Classification, and Signature Generation for Organizing 
Large Text Databases into Hierarchical Taxonomies", VLDB Journal, invited paper, 
August, 1998" (Column 4, lines 61-67-Column 5, lines 1-13). 

Chakrabarti does not explicitly teach: 
I) identifying a resource locator string associated with the disallowed fetched document; 
and 

J) placing the resource locator string for the fetched document in a blacklist in order to 
prevent future crawling of the fetched document. 

Liang, however, teaches "identifying a resource locator string associated 
with the disallowed fetched document" as "As shown in FIG. 9, in step 902, web 
spider 26 is provided with a first URL of a web site known to contain pornographic 
material. In a preferred embodiment, the web site is one that comprises a plurality of 
links to both additional pages at the pornographic website, as well as other 
pornographic websites" (Paragraph 63), and "wherein the miner comprises an 
unfocus miner that places the resulting uniform resource locator strings that 
match an unfocus topic in a blacklist, so that the uniform resource locator strings 
will not be crawled again" as "web spider 26 determines whether the retrieved web 
content contains pornographic material. If it does, then in step 908, web spider 26 adds 
the URL to list 28" (Paragraph 63). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Liang's would have allowed Chakrabarti's to provide a method to allow for web 
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crawlers and spiders to dynamically restrict unwanted and unacceptable material, as 
noted by Liang (Paragraph 3). 

Chakrabarti and Liang do not explicitly teach: 
L) analyzing a plurality of fetched documents obtained from the crawling bv providing a 
graphical indicia of one or more properties that are indicative of which of the plurality of 
documents represent a best source of information for a specific topic within the plurality 
of fetched documents . 

Weiss, however, teaches " analyzing a plurality of fetched documents 
obtained from the crawling bv providing a graphical indicia of one or more 
properties that are indicative of which of the plurality of documents represent a 
best source of information for a specific topic within the plurality of fetched 
documents " as "FIG. 5 schematically illustrates an example of clusters organized in a 
tree structure, according to an embodiment of the invention. The "Sport" cluster (or 
"group") contains several sub-clusters (or sub-groups)-Football, Basketball and Boxing 
sub-clusters, etc. The cluster "Charlie's Angels" appears as a sub-cluster of the TV 
Series cluster, as a sub-cluster of the movies cluster and as a sub-cluster of the boxing 
cluster (there is a boxing team that is called "Charlie's Angels"). The circles denote Web 
sites. A Web site can belong to several clusters. The data structure created by the 
clustering process can also be seen as a map of the web, since every site in the web 
has a specific location in the tree" (Column 7, lines 58-67-Column 8, lines 1-3) and "The 
last level of the focusing process is the presentation of a street, as described above. 
FIG. 10 schematically illustrates an example of a "street" presentation of a group of 
Web sites found in a Web search, according to an embodiment of the invention. The 
buildings, each represents a Web site, are numbered from 11 to 16. Building 14 
represents a Web site, which is owned by an enterprise, hence, its presentation is like 
an office building. Building 13 represents an amateur Web site and hence, it is 
presented like a private house. Building 16 represents a Web site that is owned by an 
academic institute, and therefore is presented like a campus. Building 11 represents a 
Web site that sells products, for example, it has an e-store, and thus it comprises a 
display-window. As mentioned above, the height of each building is relative to the 
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number of hyperlinks pointing to and from the Web site represented by it. The width of 
the Web site represents, for example, the amount of information in the Web site. This 
parameter can be determined by the amount of words, pages, bytes, and so forth. It 
should be noted that the parameters of each Web site, as well as the continents, which 
are formed according to clusters, are attained and prepared for display by the search 
engine facility prior to the search by the user, by a process independent of the user 
search, which is carried out in real time. The application described above is 
geographically oriented. However, other reference "worlds" may be implemented in 
order to emphasize the attributes of a Web site" (Column 10, lines 24-50. 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Weiss's would have allowed Chakrabarti's and Liang's to provide a method to allow 
for web crawlers and spiders to obtain internet content which provides presentation of 
the Web sites, such that the visualization reveals certain attributes of the presented 
Web sites, as noted by Weiss (Column 3, lines 11-13). 

Regarding claim 19, Chakrabarti teaches a system comprising: 
A) a plurality of seed uniform resource locator strings that are used to initiate the 
collaborative focused crawling of the documents (Column 5, lines 61-67-Column 6, lines 
1-15). 

The examiner notes that Chakrabarti teaches "a plurality of seed uniform 
resource locator strings that are used to initiate the collaborative focused 
crawling of the documents" as "It is to be understood that information pertaining to a 
"seed" set of Web pages is initially stored in the Web page table 32. The seed set can 
be gathered from, e.g., the temporary Internet file directories of the employees of a 
company or from some other group that can be expected to have shared 
interests... Thus, the seed set does not define a comprehensive, universal set of all 
topics on the Web, but rather a relatively narrow topic or range of topics that are of 
interest to the particular source" (Column 5, lines 61-67-Column 6, lines 1-4). 
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Regarding claim 20, Chakrabarti teaches a system product comprising: 
A) wherein the crawler further crawls the seed uniform resource locator strings (Column 

6. lines 61-67-Column 7, lines 1-2, Column 10, lines 44-64). 

The examiner notes that Chakrabarti teaches "wherein the crawler further 
crawls the seed uniform resource locator strings" as "starting with the seed set the 
URL of each page is selected" (Column 6, lines 61-62) and "the current page is 
classified to its topics, using the topic analyzer 28 (FIG. 1), and then the page is 
evaluated for relevancy to the predefined topic at the decision diamond 1 16. ..when the 
page is a "good" page the logic expands the outlinks of the page" (Column 10, lines 45- 
51). 

7. Claims 10-13 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Chakrabarti et al. (U.S. Patent 6,418,433) in view of Liang (U.S. PGPUB 
2001/0044818), and further in view of Weiss et al. (U.S. Patent 7,085,753) as applied 
to claims 1-9, and 14-20 above, and in view of Heydon et al. (Article entitled "Mercator: 
A Scalable, Extensible Web Crawler", dated 06/26/1999). 

8. Regarding claim 10, Chakrabarti, Liang, and Weiss do not explicitly teach a 
method comprising: 

A) the miner allowing a crawling of the resulting resource locator string when the 
resulting resource locator string matches a plurality of web space rules. 

Heydon, however, teaches "the miner allowing a crawling of the resulting 
resource locator string when the resulting resource locator string matches a 
plurality of web space rules" as "The URL filtering mechanism provides a 
customizable way to control the set of URLs that are downloaded... The URL filter class 
has a single crawl method that takes a URL and returns a Boolean value indicating 
whether or not to crawl that URL" (Page 6, Section: 3.6: URL Filters). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Heydon's would have allowed Chakrabarti's, Liang's, and Weiss's to provide a 
scalable and customizable web crawler to fit a specific user's needs, as noted by 
Heydon (Page 2, Section: 2: Related Work). 
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Regarding claim 1 1 , Chakrabarti, Liang, and Weiss do not explicitly teach a 
method comprising: 

A) wherein the web space rules comprise domain rules, IP address rules, and prefix 
rules. 

Heydon, however, teaches "wherein the web space rules comprise domain 
rules, IP address rules, and prefix rules" as "Mercator includes a collection of 
different URL filter subclasses that provide facilities for restricting URLs by domain, 
prefix, or protocol type" (Page 6, Section: 3.6: URL Filters). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Heydon's would have allowed Chakrabarti's, Liang's, and Weiss's to provide a 
scalable and customizable web crawler to fit a specific user's needs, as noted by 
Heydon (Page 2, Section: 2: Related Work). 

Regarding claim 12, Chakrabarti does not explicitly teach a method comprising: 
A) the miner disallowing the crawling of the resulting resource locator string when the 
content of the resulting resource locator string matches a focus topic of the miner. 

Liang, however, teaches "the miner disallowing the crawling of the resulting 
resource locator string when the content of the resulting resource locator string 
matches a focus topic of the miner" as "Web spider 26 is preferably provided with a 
copy of the lexicon described above so as to permit it to recognize pornographic 
material" (Paragraph 62) and "if any page in a website is discovered as comprising 
pornographic material, all pages "below" that page in the sitemap for the website may 
be blocked (Paragraph 68). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Liang's would have allowed Chakrabarti's to provide a method to allow for web 
crawlers and spiders to dynamically restrict unwanted and unacceptable material, as 
noted by Liang (Paragraph 3). 
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Regarding claim 13, Chakrabarti does not explicitly teach a method comprising: 
A) wherein the miner comprises an unfocus miner that places the resulting uniform 
resource locator strings that match an unfocus topic in the blacklist, so that the uniform 
resource locator strings will not be crawled again. 

Liang, however, teaches "wherein the miner comprises an unfocus miner 
that places the resulting uniform resource locator strings that match an unfocus 
topic in the blacklist, so that the uniform resource locator strings will not be 
crawled again" as "web spider 26 determines whether the retrieved web content 
contains pornographic material. If it does, then in step 908, web spider 26 adds the 
URL to list 28" (Paragraph 63). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of the cited references because teaching 
Liang's would have allowed Chakrabarti's to provide a method to allow for web 
crawlers and spiders to dynamically restrict unwanted and unacceptable material, as 
noted by Liang (Paragraph 3). 

Response to Arguments 

9. Applicant's arguments with respect to claims 1-20 have been considered but are 
moot in view of the new ground(s) of rejection. 

Conclusion 

1 0. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

U.S. Patent 6,199,081 issued to Meyerzon et al. on 06 March 2001 . The subject 
matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to specifically 
crawl targeted subject matter). 

U.S. PGPUB 2004/0049514 issued to Burkov on 1 1 March 2004. The subject 
matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to specifically 
crawl targeted subject matter). 
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U.S. PGPUB 2002/01 941 61 issued to McNamee et al. on 1 9 December 2002. 
The subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to 
specifically crawl targeted subject matter). 

U.S. Patent 6,754,873 issued to Law et al. on 22 June 2002. The subject matter 
disclosed therein is pertinent to that of claims 1-20 (e.g., methods to specifically crawl 
targeted subject matter). 

U.S. Patent 7,080,073 issued to Jiang et al. on 18 July 2006. The subject 
matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to specifically 
crawl targeted subject matter). 

U.S. PGPUB 2006/0277175 issued to Jiang et al. on 07 December 2006. The 
subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to 
specifically crawl targeted subject matter). 

U.S. Patent 6,993,534 issued to Denesuk et al. on 31 January 2006. The 
subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to 
specifically crawl targeted subject matter). 

U.S. Patent 6,295,559 issued to Emens et al. on 25 September 2001 . The 
subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to 
specifically crawl targeted subject matter). 

U.S. PGPUB 2002/0032869 issued to Lamberton et al. on 14 March 2002. The 
subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to 
specifically crawl targeted subject matter). 

U.S. Patent 6,691,108 issued to Li on 10 February 2004. The subject matter 
disclosed therein is pertinent to that of claims 1-20 (e.g., methods to specifically crawl 
targeted subject matter). 

Contact Information 

1 1 . Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Mahesh Dwivedi whose telephone number is (571) 272- 
2731 . The examiner can normally be reached on Monday to Friday 8:20 am - 4:40 pm. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Tim Vo can be reached (571) 272-3642. The fax number for the 
organization where this application or proceeding is assigned is (571) 273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http:// pair-direct.uspto.gov . Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

Mahesh Dwivedi 
Patent Examiner 
Art Unit 2168 

August 08, 2008 
/Mahesh H Dwivedi/ 
Examiner, Art Unit 2168 

/Tim T. Vo/ 

Supervisory Patent Examiner, Art Unit 2168 



